Spoken Text Difficulty Estimation Using Linguistic Features
نویسندگان
چکیده
We present an automated method for estimating the difficulty of spoken texts for use in generating items that assess non-native learners’ listening proficiency. We collected information on the perceived difficulty of listening to various English monologue speech samples using a Likert-scale questionnaire distributed to 15 non-native English learners. We averaged the overall rating provided by three nonnative learners at different proficiency levels into an overall score of listenability. We then trained a multiple linear regression model with the listenability score as the dependent variable and features from both natural language and speech processing as the independent variables. Our method demonstrated a correlation of 0.76 with the listenability score, comparable to the agreement between the nonnative learners’ ratings and the listenability score.
منابع مشابه
Organization of Gatekeeping and Mental Framework in the System of Representation and Hierarchical Relational Structures of the Modern Society
Critical discourse analysis as a type of social practice reveals how linguistic choices enable speakers to manipulate the realizations of agency and power in the representation of action.The present study examines the relationship between language and ideology and explores how such a relationship is represented in the analysis of spoken text and to show how declarative knowledge, beliefs, attit...
متن کاملLinguistic Features for Quality Estimation
This paper describes a study on the contribution of linguistically-informed features to the task of quality estimation for machine translation at sentence level. A standard regression algorithm is used to build models using a combination of linguistic and non-linguistic features extracted from the input text and its machine translation. Experiments with EnglishSpanish translations show that lin...
متن کاملLatin Etymologies as Features on BNC Text Categorization
This paper presents an early experimental work on BNC Text Categorization (TC) with Latin etymologies as features, emphasis on spoken and written texts. Two aims achieved in this study: (1) to explore discriminative new linguistic features rather than lots of noise-bringing “bag-of-words” (BoW). (2) to build up a base step to represent texts in distinct types of linguistic features with differe...
متن کاملLinguistic Features of English Textese and Digitalk of Iranian EFL Students
This study aimed at investigating the English textese of Iranian EFL learners by scrutinizing the linguistic features through a qualitative design. In doing so, 700 messages were collected from 43 MA Iranian EFL learners of both genders. The features were categorized and analyzed calculating the frequency and percentage. The findings of the study showed that Iranian EFL students used different ...
متن کاملPredicting Emotion in Spoken Dialogue from Multiple Knowledge Sources
We examine the utility of multiple types of turn-level and contextual linguistic features for automatically predicting student emotions in human-human spoken tutoring dialogues. We first annotate student turns in our corpus for negative, neutral and positive emotions. We then automatically extract features representing acoustic-prosodic and other linguistic information from the speech signal an...
متن کامل